Towards Evolutionary Nonnegative Matrix Factorization

نویسندگان

  • Fei Wang
  • Hanghang Tong
  • Ching-Yung Lin
چکیده

Nonnegative Matrix Factorization (NMF) techniques has aroused considerable interests from the field of artificial intelligence in recent years because of its good interpretability and computational efficiency. However, in many real world applications, the data features usually evolve over time smoothly. In this case, it would be very expensive in both computation and storage to rerun the whole NMFprocedure after each time when the data feature changing. In this paper, we propose Evolutionary Nonnegative Matrix Factorization (eNMF), which aims to incrementally update the factorized matrices in a computation and space efficient manner with the variation of the data matrix. We devise such evolutionary procedure for both asymmetric and symmetricNMF. Finally we conduct experiments on several real world data sets to demonstrate the efficacy and efficiency of eNMF. Introductions The recent years have witnessed a surge of interests on Nonnegative Matrix Factorization (NMF) from the artificial intelligence field (Lee and Seung 1999)(Lee and Seung 2001)(Lin 2007)(Kim and Park 2008). Differernt from traditional spectral decomposition methods such as Principal Component Analysis (PCA) and Singular Value Decomposition (SVD), NMF (1) is usually additive, which results in a better interpretation ability; (2) does not require the factorized latent spaces to be orthogonal, which allows more flexibility to adapt the representation to the data set. NMF has successfully been used in many real world applications, such as information retrieval (Shahnaz et al. 2006), environmental study (Anttila et al. 1995), computer vision (Guillamet, Bressan, and Vitrià 2001) and computational social/network science (Wang et al. 2010). Formally, what NMF does is to factorize a nonnegative data matrix into the product of two (low-rank) nonnegative latent matrices. As NMF requires both factorized matrices to be nonnegative, this will generally lead to sparse, partbased representation of the original data set, which is semantically much more meaningful compared to traditional factorization/basis learning methods. Due to the empirical and theoretical success of NMF, people have been working Copyright c © 2011, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. on a lot of NMF extensions in the last decade to fit in more application scenarios. Some representative algorithms include nonnegative sparse coding (Eggert and Korner 2004), semi and convex NMF (Ding, Li, and Jordan 2010), and orthogonal tri-NMF (Ding et al. 2006). Many algorithms have been proposed to solve NMF, such as multiplicative updates (Lee and Seung 2001), active set (Kim and Park 2008) and projected gradient (Lin 2007). However, all these algorithms require to hold the whole data matrix in main memory in the entire NMF process, which is quite inefficient in terms of storage cost when the data matrix large (either in data size or the feature dimensionality). To solve this problem, several researchers proposed memory efficient online implementations for NMF in recent years (Cao et al. 2007)(Wang, Li, and König 2011)(Saha and Sindhwani 2010). Rather than processing all data points in a batch mode, these approaches process the data point one at a time in a streaming fashion. Thus they only require the memory to hold one data point through the whole procedure. In this paper, we consider the problem of NMF in another scenario where the data features are evolving over time1. A straightforward solution is to rerun the whole NMF procedure at each time stamp when the data feature change. However, this poses several challenges in terms of space cost, computational time as well as privacy. Let X and X̃ = X + ΔX be the old and new data feature matrices respectively. In many real applications, ΔX is usually very sparse while X̃ is not2. It therefore is not efficient in terms of space cost to re-run NMF since we need to store the whole data feature matrix X̃. It is also not efficient in computation since it requires some matrix-matrix multiplication between X̃ and the two factorized matrices. What is more, this strategy becomes infeasible for those privacy-sensitive applications where the whole data feature matrix X̃ might not be available at a given time stamp. For instance, Facebook’s3 The difference between this setting and online learning is that in online learning, the data points are processed one by one, i.e., the elements in the data matrix are changed one column at a time. However, in our scenario, we allow any elements in the data matrix to change from time to time. Even if X̃ is also sparse, it is usually much denser compared with the ΔX matrix. See table 1 for some examples. http://www.facebook.com/ Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Modified Digital Image Watermarking Scheme Based on Nonnegative Matrix Factorization

This paper presents a modified digital image watermarking method based on nonnegative matrix factorization. Firstly, host image is factorized to the product of three nonnegative matrices. Then, the centric matrix is transferred to discrete cosine transform domain. Watermark is embedded in low frequency band of this matrix and next, the reverse of the transform is computed. Finally, watermarked ...

متن کامل

A Modified Digital Image Watermarking Scheme Based on Nonnegative Matrix Factorization

This paper presents a modified digital image watermarking method based on nonnegative matrix factorization. Firstly, host image is factorized to the product of three nonnegative matrices. Then, the centric matrix is transferred to discrete cosine transform domain. Watermark is embedded in low frequency band of this matrix and next, the reverse of the transform is computed. Finally, watermarked ...

متن کامل

A Projected Alternating Least square Approach for Computation of Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) is a common method in data mining that have been used in different applications as a dimension reduction, classification or clustering method. Methods in alternating least square (ALS) approach usually used to solve this non-convex minimization problem.  At each step of ALS algorithms two convex least square problems should be solved, which causes high com...

متن کامل

A new approach for building recommender system using non negative matrix factorization method

Nonnegative Matrix Factorization is a new approach to reduce data dimensions. In this method, by applying the nonnegativity of the matrix data, the matrix is ​​decomposed into components that are more interrelated and divide the data into sections where the data in these sections have a specific relationship. In this paper, we use the nonnegative matrix factorization to decompose the user ratin...

متن کامل

A Multiscale Approach for Nonnegative Matrix Factorization with Applications to Image Classification

We use a multiscale approach to reduce the time to produce the nonnegative matrix factorization (NMF) of a matrix A, that is, A ≈ WH. We also investigate QR factorization as a method for initializing W during the iterative process for producing the nonnegative matrix factorization of A. Finally, we use our approach to produce nonnegative matrix factorizations for classifying images and compare ...

متن کامل

Nonnegative Matrix and Tensor Factorization

T here has been a recent surge of interest in matrix and tensor factorization (decomposition), which provides meaningful latent (hidden) components or features with physical or physiological meaning and interpretation. Nonnegative matrix factorization (NMF) and its extension to three-dimensional (3-D) nonnegative tensor factorization (NTF) attempt to recover hidden nonnegative common structures...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011